Add PeekByte and Next optimization #32

klauspost · 2024-09-16T14:41:59Z

Adds call to get a single byte and add Next happy path.

Will send msgp PR with benchmarks.

Adds call to get a single byte and add Next happy path.

Utilize philhofer/fwd#32 to peek bytes Eliminate bounds checks, mostly on reading sizes, but also integers. ``` benchmark old MB/s new MB/s speedup BenchmarkReadMapHeaderBytes-32 1136.75 1136.89 1.00x BenchmarkReadArrayHeaderBytes-32 1133.52 1149.34 1.01x BenchmarkTestReadBytesHeader-32 1200.18 1171.21 0.98x BenchmarkReadNilByte-32 2907.50 2176.63 0.75x BenchmarkReadFloat64Bytes-32 4308.86 3869.64 0.90x BenchmarkReadFloat32Bytes-32 2354.55 2332.21 0.99x BenchmarkReadBoolBytes-32 863.66 1124.07 1.30x BenchmarkReadTimeBytes-32 3710.27 3668.07 0.99x BenchmarkReadMapHeader-32 228.74 391.76 1.71x BenchmarkReadArrayHeader-32 243.90 454.31 1.86x BenchmarkReadNil-32 127.42 191.55 1.50x BenchmarkReadFloat64-32 1046.08 1142.49 1.09x BenchmarkReadFloat32-32 595.67 669.65 1.12x BenchmarkReadInt64-32 422.29 798.35 1.89x BenchmarkReadUintWithInt64-32 186.59 371.28 1.99x BenchmarkReadUint64-32 282.87 377.76 1.34x BenchmarkReadIntWithUint64-32 465.30 684.49 1.47x BenchmarkRead16Bytes-32 1021.15 1021.38 1.00x BenchmarkRead256Bytes-32 5324.42 6166.06 1.16x BenchmarkRead2048Bytes-32 8405.41 8382.61 1.00x BenchmarkRead16StringAsBytes-32 1102.72 1321.71 1.20x BenchmarkRead256StringAsBytes-32 6959.78 6627.49 0.95x BenchmarkRead16String-32 320.75 378.46 1.18x BenchmarkRead256String-32 2066.40 2242.78 1.09x BenchmarkReadComplex64-32 900.13 1064.21 1.18x BenchmarkReadComplex128-32 1809.88 2028.11 1.12x BenchmarkReadTime-32 1362.46 1425.79 1.05x ``` Apologies for the noisy benchmark. `BenchmarkReadNilByte` and `BenchmarkReadFloat64Byte` does not touch new code, so it seems to be "micro-bench-itis", where random deltas show up. Bench takes ~30m to run for whatever reason. The compiler is not clever enough to track back to `l := len(b)`, so it inserts a bounds check. Also `len(b) < 3` and `big.Uint16(b[1:])` causes a bounds check. So we rewrite those to avoid it. This should cover the lowest hanging fruits.

* Improve read speed Utilize philhofer/fwd#32 to peek bytes Eliminate bounds checks, mostly on reading sizes, but also integers. ``` benchmark old MB/s new MB/s speedup BenchmarkReadMapHeaderBytes-32 1136.75 1136.89 1.00x BenchmarkReadArrayHeaderBytes-32 1133.52 1149.34 1.01x BenchmarkTestReadBytesHeader-32 1200.18 1171.21 0.98x BenchmarkReadNilByte-32 2907.50 2176.63 0.75x BenchmarkReadFloat64Bytes-32 4308.86 3869.64 0.90x BenchmarkReadFloat32Bytes-32 2354.55 2332.21 0.99x BenchmarkReadBoolBytes-32 863.66 1124.07 1.30x BenchmarkReadTimeBytes-32 3710.27 3668.07 0.99x BenchmarkReadMapHeader-32 228.74 391.76 1.71x BenchmarkReadArrayHeader-32 243.90 454.31 1.86x BenchmarkReadNil-32 127.42 191.55 1.50x BenchmarkReadFloat64-32 1046.08 1142.49 1.09x BenchmarkReadFloat32-32 595.67 669.65 1.12x BenchmarkReadInt64-32 422.29 798.35 1.89x BenchmarkReadUintWithInt64-32 186.59 371.28 1.99x BenchmarkReadUint64-32 282.87 377.76 1.34x BenchmarkReadIntWithUint64-32 465.30 684.49 1.47x BenchmarkRead16Bytes-32 1021.15 1021.38 1.00x BenchmarkRead256Bytes-32 5324.42 6166.06 1.16x BenchmarkRead2048Bytes-32 8405.41 8382.61 1.00x BenchmarkRead16StringAsBytes-32 1102.72 1321.71 1.20x BenchmarkRead256StringAsBytes-32 6959.78 6627.49 0.95x BenchmarkRead16String-32 320.75 378.46 1.18x BenchmarkRead256String-32 2066.40 2242.78 1.09x BenchmarkReadComplex64-32 900.13 1064.21 1.18x BenchmarkReadComplex128-32 1809.88 2028.11 1.12x BenchmarkReadTime-32 1362.46 1425.79 1.05x ``` Apologies for the noisy benchmark. `BenchmarkReadNilByte` and `BenchmarkReadFloat64Byte` does not touch new code, so it seems to be "micro-bench-itis", where random deltas show up. Bench takes ~30m to run for whatever reason. The compiler is not clever enough to track back to `l := len(b)`, so it inserts a bounds check. Also `len(b) < 3` and `big.Uint16(b[1:])` causes a bounds check. So we rewrite those to avoid it. This should cover the lowest hanging fruits. * Fix up ReadExactBytes - do ReadArrayHeaderBytes as well. * Don't stop the timer. Makes benchmarks take forever. Simplify EndlessReader

Add PeekByte

c51132b

Adds call to get a single byte and add Next happy path.

philhofer approved these changes Sep 16, 2024

View reviewed changes

philhofer merged commit 20a13a1 into philhofer:master Sep 16, 2024

klauspost deleted the add-peekbyte branch September 16, 2024 15:00

klauspost mentioned this pull request Sep 16, 2024

Improve read speed tinylib/msgp#361

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PeekByte and Next optimization #32

Add PeekByte and Next optimization #32

klauspost commented Sep 16, 2024

Add PeekByte and Next optimization #32

Add PeekByte and Next optimization #32

Conversation

klauspost commented Sep 16, 2024